{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# 1. Installing the dependencies"
      ],
      "metadata": {
        "id": "suxMDxPezXG-"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "collapsed": true,
        "id": "hMq04i_FJIvI",
        "outputId": "5bb8b488-917f-4931-87f6-1224b6aa4074"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (2.32.3)\n",
            "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.11/dist-packages (4.13.4)\n",
            "Collecting duckduckgo-search\n",
            "  Downloading duckduckgo_search-8.1.1-py3-none-any.whl.metadata (16 kB)\n",
            "Requirement already satisfied: tenacity in /usr/local/lib/python3.11/dist-packages (8.5.0)\n",
            "Collecting mirascope[openai]\n",
            "  Downloading mirascope-1.25.4-py3-none-any.whl.metadata (8.5 kB)\n",
            "Requirement already satisfied: docstring-parser<1.0,>=0.15 in /usr/local/lib/python3.11/dist-packages (from mirascope[openai]) (0.16)\n",
            "Requirement already satisfied: jiter>=0.5.0 in /usr/local/lib/python3.11/dist-packages (from mirascope[openai]) (0.10.0)\n",
            "Requirement already satisfied: pydantic<3.0,>=2.7.4 in /usr/local/lib/python3.11/dist-packages (from mirascope[openai]) (2.11.7)\n",
            "Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.11/dist-packages (from mirascope[openai]) (4.14.1)\n",
            "Requirement already satisfied: openai<2,>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from mirascope[openai]) (1.93.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests) (3.4.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests) (3.10)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests) (2.4.0)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests) (2025.6.15)\n",
            "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.11/dist-packages (from beautifulsoup4) (2.7)\n",
            "Requirement already satisfied: click>=8.1.8 in /usr/local/lib/python3.11/dist-packages (from duckduckgo-search) (8.2.1)\n",
            "Collecting primp>=0.15.0 (from duckduckgo-search)\n",
            "  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)\n",
            "Requirement already satisfied: lxml>=5.3.0 in /usr/local/lib/python3.11/dist-packages (from duckduckgo-search) (5.4.0)\n",
            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from openai<2,>=1.6.0->mirascope[openai]) (4.9.0)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from openai<2,>=1.6.0->mirascope[openai]) (1.9.0)\n",
            "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from openai<2,>=1.6.0->mirascope[openai]) (0.28.1)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from openai<2,>=1.6.0->mirascope[openai]) (1.3.1)\n",
            "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.11/dist-packages (from openai<2,>=1.6.0->mirascope[openai]) (4.67.1)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0,>=2.7.4->mirascope[openai]) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0,>=2.7.4->mirascope[openai]) (2.33.2)\n",
            "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0,>=2.7.4->mirascope[openai]) (0.4.1)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->openai<2,>=1.6.0->mirascope[openai]) (1.0.9)\n",
            "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai<2,>=1.6.0->mirascope[openai]) (0.16.0)\n",
            "Downloading duckduckgo_search-8.1.1-py3-none-any.whl (18 kB)\n",
            "Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.3/3.3 MB\u001b[0m \u001b[31m33.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading mirascope-1.25.4-py3-none-any.whl (373 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m373.2/373.2 kB\u001b[0m \u001b[31m20.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: primp, duckduckgo-search, mirascope\n",
            "Successfully installed duckduckgo-search-8.1.1 mirascope-1.25.4 primp-0.15.0\n"
          ]
        }
      ],
      "source": [
        "!pip install \"mirascope[openai]\""
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## OpenAI  Key\n",
        "To get an OpenAI API key, visit https://platform.openai.com/settings/organization/api-keys and generate a new key. If you're a new user, you may need to add billing details and make a minimum payment of $5 to activate API access."
      ],
      "metadata": {
        "id": "mZvQi54nzc7t"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ncRm0BGOJubm",
        "outputId": "2d19347a-db9a-47e4-e01d-00636efee701"
      },
      "execution_count": 11,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Enter OpenAI API Key: ··········\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import warnings\n",
        "warnings.filterwarnings(\"ignore\")"
      ],
      "metadata": {
        "id": "_1LQzggVOSC9"
      },
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 2. Defining the list of customer reviews"
      ],
      "metadata": {
        "id": "1ckMUnZZzq1w"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "customer_reviews = [\n",
        "    \"Sound quality is amazing!\",\n",
        "    \"Audio is crystal clear and very immersive.\",\n",
        "    \"Incredible sound, especially the bass response.\",\n",
        "    \"Battery doesn't last as advertised.\",\n",
        "    \"Needs charging too often.\",\n",
        "    \"Battery drains quickly — not ideal for travel.\",\n",
        "    \"Setup was super easy and straightforward.\",\n",
        "    \"Very user-friendly, even for my parents.\",\n",
        "    \"Simple interface and smooth experience.\",\n",
        "    \"Feels cheap and plasticky.\",\n",
        "    \"Build quality could be better.\",\n",
        "    \"Broke within the first week of use.\",\n",
        "    \"People say they can’t hear me during calls.\",\n",
        "    \"Mic quality is terrible on Zoom meetings.\",\n",
        "    \"Great product for the price!\"\n",
        "]\n"
      ],
      "metadata": {
        "id": "0onbpNdQPS0f"
      },
      "execution_count": 31,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 3. Defining a Pydantic Schema\n",
        "This Pydantic model defines the structure for the response of a semantic deduplication task on customer reviews. This schema helps structure and validate the output of a language model tasked with clustering or deduplicating natural language input (e.g., user feedback, bug reports, product reviews)."
      ],
      "metadata": {
        "id": "0QgBZV1Nz8vL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from pydantic import BaseModel, Field\n",
        "\n",
        "class DeduplicatedReviews(BaseModel):\n",
        "    duplicates: list[list[str]] = Field(\n",
        "        ..., description=\"A list of semantically equivalent customer review groups\"\n",
        "    )\n",
        "    reviews: list[str] = Field(\n",
        "        ..., description=\"The deduplicated list of core customer feedback themes\"\n",
        "    )\n"
      ],
      "metadata": {
        "id": "AlQpv_XHPU4B"
      },
      "execution_count": 32,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 4. Defining a Mirascope @openai.call for Semantic Deduplication\n",
        "This code defines a semantic deduplication function using Mirascope's @openai.call decorator, which enables seamless integration with OpenAI's gpt-4o model. The deduplicate_customer_reviews function takes a list of customer reviews and uses a structured prompt—defined by the @prompt_template decorator—to guide the LLM in identifying and grouping semantically similar reviews.\n",
        "\n",
        "The system message instructs the model to analyze the meaning, tone, and intent behind each review, clustering those that convey the same feedback even if worded differently. The function expects a structured response conforming to the DeduplicatedReviews Pydantic model, which includes two outputs: a list of unique, deduplicated review sentiments, and a list of grouped duplicates.\n",
        "\n",
        "This design ensures that the LLM's output is both accurate and machine-readable, making it ideal for customer feedback analysis, survey deduplication, or product review clustering."
      ],
      "metadata": {
        "id": "INtbqaZ00JYl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from mirascope.core import openai, prompt_template\n",
        "\n",
        "@openai.call(model=\"gpt-4o\", response_model=DeduplicatedReviews)\n",
        "@prompt_template(\n",
        "    \"\"\"\n",
        "    SYSTEM:\n",
        "    You are an AI assistant helping to analyze customer reviews.\n",
        "    Your task is to group semantically similar reviews together — even if they are worded differently.\n",
        "\n",
        "    - Use your understanding of meaning, tone, and implication to group duplicates.\n",
        "    - Return two lists:\n",
        "      1. A deduplicated list of the key distinct review sentiments.\n",
        "      2. A list of grouped duplicates that share the same underlying feedback.\n",
        "\n",
        "    USER:\n",
        "    {reviews}\n",
        "    \"\"\"\n",
        ")\n",
        "def deduplicate_customer_reviews(reviews: list[str]): ...\n"
      ],
      "metadata": {
        "id": "biRt-LfvPVl3"
      },
      "execution_count": 34,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "This code executes the deduplicate_customer_reviews function using a list of customer reviews and prints the structured output. First, it calls the function and stores the result in the response variable. To ensure that the model's output conforms to the expected format, it uses an assert statement to validate that the response is an instance of the DeduplicatedReviews Pydantic model.\n",
        "\n",
        "Once validated, it prints the deduplicated results in two sections. The first section, labeled \"✅ Distinct Customer Feedback,\" displays the list of unique review sentiments identified by the model. The second section, \"🌀 Grouped Duplicates,\" lists clusters of reviews that were recognized as semantically equivalent."
      ],
      "metadata": {
        "id": "72UhAPfB0pyg"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "response = deduplicate_customer_reviews(customer_reviews)\n",
        "\n",
        "# Ensure response format\n",
        "assert isinstance(response, DeduplicatedReviews)\n",
        "\n",
        "# Print Output\n",
        "print(\"✅ Distinct Customer Feedback:\")\n",
        "for item in response.reviews:\n",
        "    print(\"-\", item)\n",
        "\n",
        "print(\"\\n🌀 Grouped Duplicates:\")\n",
        "for group in response.duplicates:\n",
        "    print(\"-\", group)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "MiM9hGgSPc9M",
        "outputId": "fd6ad95d-96e0-42ac-c455-561fc52e230c"
      },
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "✅ Distinct Customer Feedback:\n",
            "- Sound quality is amazing!\n",
            "- Battery doesn't last as advertised.\n",
            "- Setup was super easy and straightforward.\n",
            "- Feels cheap and plasticky.\n",
            "- People say they can’t hear me during calls.\n",
            "- Great product for the price!\n",
            "\n",
            "🌀 Grouped Duplicates:\n",
            "- ['Sound quality is amazing!', 'Audio is crystal clear and very immersive.', 'Incredible sound, especially the bass response.']\n",
            "- [\"Battery doesn't last as advertised.\", 'Needs charging too often.', 'Battery drains quickly — not ideal for travel.']\n",
            "- ['Setup was super easy and straightforward.', 'Very user-friendly, even for my parents.', 'Simple interface and smooth experience.']\n",
            "- ['Feels cheap and plasticky.', 'Build quality could be better.', 'Broke within the first week of use.']\n",
            "- ['People say they can’t hear me during calls.', 'Mic quality is terrible on Zoom meetings.']\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The output shows a clean summary of customer feedback by grouping semantically similar reviews. The Distinct Customer Feedback section highlights key insights, while the Grouped Duplicates section captures different phrasings of the same sentiment. This helps eliminate redundancy and makes the feedback easier to analyze."
      ],
      "metadata": {
        "id": "21jkQvh21BWO"
      }
    }
  ]
}